14 research outputs found

    Fast transcription of unstructured audio recordings

    Get PDF
    URL to conference session list. Title is under heading: Wed-Ses1-P1: Phonetics, Phonology, cross-language comparisons, pathologyWe introduce a new method for human-machine collaborative speech transcription that is significantly faster than existing transcription methods. In this approach, automatic audio processing algorithms are used to robustly detect speech in audio recordings and split speech into short, easy to transcribe segments. Sequences of speech segments are loaded into a transcription interface that enables a human transcriber to simply listen and type, obviating the need for manually finding and segmenting speech or explicitly controlling audio playback. As a result, playback stays synchronized to the transcriber's speed of transcription. In evaluations using naturalistic audio recordings made in everyday home situations, the new method is up to 6 times faster than other popular transcription tools while preserving transcription quality

    The birth of a word

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 185-192).A hallmark of a child's first two years of life is their entry into language, from first productive word use around 12 months of age to the emergence of combinatorial speech in their second year. What is the nature of early language development and how is it shaped by everyday experience? This work builds from the ground up to study early word learning, characterizing vocabulary growth and its relation to the child's environment. Our study is guided by the idea that the natural activities and social structures of daily life provide helpful learning constraints. We study this through analysis of the largest-ever corpus of one child's everyday experience at home. Through the Human Speechome Project, the home of a family with a young child was outfitted with a custom audio-video recording system, capturing more than 200,000 hours of audio and video of daily life from birth to age three. The annotated subset of this data spans the child's 9-24 month age range and contains more than 8 million words of transcribed speech, constituting a detailed record of both the child's input and linguistic development. Such a comprehensive, naturalistic dataset presents new research opportunities but also requires new analysis approaches - questions must be operationalized to leverage the full scale of the data. We begin with the task of speech transcription, then identify "word births" - the child's first use of each word in his vocabulary. Vocabulary growth accelerates and then shows a surprising deceleration that coincides with an increase in combinatorial speech. The vocabulary growth timeline provides a means to assess the environmental contributions to word learning, beginning with aspects of caregiver input speech. But language is tied to everyday activity, and we investigate how spatial and activity contexts relate to word learning. Activity contexts, such as "mealtime", are identified manually and with probabilistic methods that can scale to large datasets. These new nonlinguistic variables are predictive of when words are learned and are complementary to more traditionally studied linguistic measures. Characterizing word learning and assessing natural input variables can lead to new insights on fundamental learning mechanisms.by Brandon Cain Roy.Ph.D

    Human-machine collaboration for rapid speech transcription

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2007.Includes bibliographical references (p. 121-127).Inexpensive storage and sensor technologies are yielding a new generation of massive multimedia datasets. The exponential growth in storage and processing power makes it possible to collect more data than ever before, yet without appropriate content annotation for search and analysis such corpora are of little use. While advances in data mining and machine learning have helped to automate some types of analysis, the need for human annotation still exists and remains expensive. The Human Speechome Project is a heavily data-driven longitudinal study of language acquisition. More than 100,000 hours of audio and video recordings have been collected over a two year period to trace one child's language development at home. A critical first step in analyzing this corpus is to obtain high quality transcripts of all speech heard and produced by the child. Unfortunately, automatic speech transcription has proven to be inadequate for these recordings, and manual transcription with existing tools is extremely labor intensive and therefore expensive. A new human-machine collaborative system for rapid speech transcription has been developed which leverages both the quality of human transcription and the speed of automatic speech processing. Machine algorithms sift through the massive dataset to find and segment speech. The results of automatic analysis are handed off to humans for transcription using newly designed tools with an optimized user interface. The automatic algorithms are tuned to optimize human performance, and errors are corrected by the human and used to iteratively improve the machine performance. When compared with other popular transcription tools, the new system is three- to six-fold faster, while preserving transcription quality. When applied to the Speechome audio corpus, over 100 hours of multitrack audio can be transcribed in about 12 hours by a single human transcriber.by Brandon C. Roy.S.M

    Exploring word learning in a high-density longitudinal corpus

    Get PDF
    What is the role of the linguistic environment in children’s early word learning? Here we provide a preliminary analysis of one child’s linguistic development, using a portion of the high-density longitudinal data collected for the Human Speechome Project. We focus particularly on the development of the child’s productive vocabulary from the age of 9 to 24 months and the relationship between the child’s language development and the caregivers’ speech. We find significant correlations between input frequencies and age of acquisition for individual words. In addition, caregivers’ utterance length, type-token ratio, and proportion of single-word utterances all show significant temporal relationships with the child’s development, suggesting that caregivers “tune” their utterances to the linguistic ability of the child

    Contributions of Prosodic and Distributional Features of Caregivers' Speech in Early Word Learning

    Get PDF
    How do characteristics of caregiver speech contribute to a child's early word learning? We explore the relationship between a single child's vocabulary growth and the distributional and prosodic characteristics of the speech he hears using data collected for the Human Speechome Project, an ecologically valid corpus collected from the home of a family with a young child. We measured F0, intensity, phoneme duration, usage frequency, recurrence, and MLU for caregivers' production of each word that the child learned during the period of recording. When all variables are considered, we obtain a model of word acquisition as a function of caregiver input speech. Coefficient estimates in the model help to illuminate which factors are relevant to learning classes of words. In addition, words that deviate from the model's prediction are of interest as they may suggest important social, contextual and other cues relevant to word learning

    Effects of Caregiver Prosody on Child Language Acquisition

    Get PDF
    http://speechprosody2010.illinois.edu/program.php (conference site)This paper investigates the role of prosody in one child’s lexical acquisition using an ecologically valid, high-density, longitudinal corpus. The corpus consists of high fidelity recordings collected from microphones embedded throughout the home of a family with a young child. We analyze data collected continuously from ages 9 – 24 months, including the child’s first productive use of language at about 11 months and ending at the child’s active use of more than 500 words. We found significant correlations between prosody of caregivers’ speech and age of acquisition for individual words

    Predicting the birth of a spoken word

    Get PDF
    Children learn words through an accumulation of interactions grounded in context. Although many factors in the learning environment have been shown to contribute to word learning in individual studies, no empirical synthesis connects across factors. We introduce a new ultradense corpus of audio and video recordings of a single child’s life that allows us to measure the child’s experience of each word in his vocabulary. This corpus provides the first direct comparison, to our knowledge, between different predictors of the child’s production of individual words. We develop a series of new measures of the distinctiveness of the spatial, temporal, and linguistic contexts in which a word appears, and show that these measures are stronger predictors of learning than frequency of use and that, unlike frequency, they play a consistent role across different syntactic categories. Our findings provide a concrete instantiation of classic ideas about the role of coherent activities in word learning and demonstrate the value of multimodal data in understanding children’s language acquisition

    The Emergence of an Abstract Grammatical Category in Children’s Early Speech

    No full text
    How do children begin to use language to say things they have never heard before? The origins of linguistic productivity have been a subject of heated debate: Whereas generativist accounts posit that children’s early language reflects the presence of syntactic abstractions, constructivist approaches instead emphasize gradual generalization derived from frequently heard forms. In the present research, we developed a Bayesian statistical model that measures the degree of abstraction implicit in children’s early use of the determiners “a” and “the.” Our work revealed that many previously used corpora are too small to allow researchers to judge between these theoretical positions. However, several data sets, including the Speechome corpus—a new ultra-dense data set for one child—showed evidence of low initial levels of productivity and higher levels later in development. These findings are consistent with the hypothesis that children lack rich grammatical knowledge at the outset of language learning but rapidly begin to generalize on the basis of structural regularities in their input

    Severe hypoglycemia and diabetic ketoacidosis in adults with type 1 diabetes: results from the T1D Exchange clinic registry

    No full text
    corecore